OpenVINO™ toolkit: An open source AI toolkit that makes it easier to write once, deploy anywhere.

What's New in Version 2024.3

The OpenVINO™ toolkit 2024.3 release enhances generative AI (GenAI) accessibility with improved large language model (LLM) performance and expanded model coverage. It also boosts portability and performance for deployment anywhere: at the edge, in the cloud, or locally. The top features of this release are:

  • Models: Pre-optimized models are now available in Hugging Face*, making it easier for you to get started.
  • Optimizations: Significant improvement in LLM performance on discrete Intel® GPUs with the addition of multi-head attention (MHA), and enhancements from Intel® oneAPI Deep Neural Network Library (oneDNN).
  • Deployment: Improved CPU performance when serving LLMs with the inclusion of vLLM and continuous batching in model serving for the OpenVINO toolkit. vLLM is an easier-to-use open source library that supports efficient LLM inferencing and model serving. More Information
Release Notes View System Requirements

Easier Model Access and Conversion

Product

Details

New Model Support

Support for Phi-3-mini, a family of AI models that takes advantage of the power of small language models for faster, more accurate, and cost-effective text processing.

Llama 3 optimizations for CPUs, built-in GPUs, and discrete GPUs for improved performance and efficient memory usage.

Python*

A Python custom operation is now enabled in the OpenVINO toolkit, making it easier for Python developers to code their custom operations instead of using C++ custom operations (also supported). This custom operation empowers you to implement your own specialized operations into any model.

Generative AI and LLM Enhancements

Expanded model support and accelerated inference.

Product

Details

New Jupyter Notebooks

An expansion to Jupyter Notebooks ensures better coverage for new models. The following noteworthy notebooks were added:

  • DynamiCrafter
  • YOLOv10*
  • Chatbot notebook with Phi-3 and Qwen2

Performance Improvements for LLMs

A GPTQ method for 4-bit weight compression was added to the Neural Networks Compression Framework (NNCF) for more efficient inference and improved performance of compressed LLMs.

There are significant LLM performance improvements and reduced latency for built-in and discrete GPUs.

Making Generative AI More Accessible for Real-World Scenarios​

OpenVINO™ toolkit is an open source toolkit that accelerates AI inference with lower latency and higher throughput while maintaining accuracy, reducing model footprint, and optimizing hardware use. It streamlines AI development and integration of deep learning in domains like computer vision, large language models (LLM), and generative AI.​

What's New in 2024.3
AI Programming Workshops for OpenVINO Toolkit

Learn with like-minded AI developers by joining live and on-demand webinars focused on GenAI, LLMs, AI PC, and more, including code-based workshops using Jupyter* Notebook.

How It Works

Convert and optimize models trained using popular frameworks like TensorFlow* and PyTorch*. Deploy across a mix of Intel® hardware and environments, on-premise and on-device, in the browser, or in the cloud.

Resources

Get started with OpenVINO and all the resources you need to learn, try samples, see performance, and more.

Get started

Unlock the Power of LLMs

Review optimization and deployment strategies using the OpenVINO toolkit. Plus, use compression techniques with LLMs on your PC.

Intel® Geti™ Platform

This is a commercial software platform that enables enterprise teams to develop vision AI models faster. With the platform, companies can build models with minimal data, and with OpenVINO integration, facilitate deploying solutions at scale.

Explore the Capabilities of the Intel® Geti™ Platform

AI Inference Software & Solutions Catalog

When you are ready to go to market with your solution, explore ISV solutions that are built on OpenVINO. This ebook is designed to help you find a solution that best addresses your use-case needs, organized into sections, such as banking or healthcare, to help you navigate the solutions table easier.

Explore the AI Inference Catalog

Toolkit Add-Ons

Take advantage of add-ons that extend the possibilities of the toolkit, and implement existing and new functionality now available in the core toolkit.

Benchmark Tool

Estimate deep learning inference performance on supported devices.

Dataset Management Framework

Use this add-on to build, transform, and analyze datasets.

Model Optimizer

This cross-platform, command-line tool facilitates the transition between training and deployment environments, performs static model analysis, and adjusts deep learning models for optimal performance on end-point target devices.

Neural Networks Compression Framework

Use this framework based on PyTorch for quantization-aware training.

Industry Model Zoos

Hugging Face* has a repository for the OpenVINO toolkit that provides resources and models aimed at optimizing deep learning models for inference on Intel hardware.

OpenVINO Model Server

This scalable inference server is for serving models optimized with the Intel® Distribution of OpenVINO™ toolkit.

Join Us on the Journey

Subscribe below to stay up to date with the latest Intel offerings.

All fields are required unless marked optional.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

By submitting this form, you are confirming you are age 18 years or older. Intel may contact me for marketing-related communications. To learn about Intel's practices, including how to manage your preferences and settings, you can visit Intel's Privacy and Cookies notices.

Resources

Community and Support

Explore ways to get involved and stay up-to-date with the latest announcements.

Get Started

Optimize, fine-tune, and run comprehensive AI inference using the included model optimizer and runtime and development tools.

Powered by oneAPI

The productive smart path to freedom from the economic and technical burdens of proprietary alternatives for accelerated computing.